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A study was made to ascertain how large a sample is 
needed to make media effectiveness decisions which are generalizable 
t<) the total 6ducablie i§fi^^ population. The 

method employed in the study involved pretesting and posttesting a 
sample of 70 primary and intermediate EMH children on the content of 
a filmstrip. Statistical analyses of the data indicated that samples 
of five .students gave results that were within the parameters- of 
decision established by the Computer Based Project (Syracuse^ N.Y.). 
When the sample size was increased to 10, the standard findings for 
increased sample size were supported, i.e. , scores were within 
smaller ranges, variance between groups was reduced, and gains were 
more standardized. However, the investigators concluded that samples 
of five subjects seem to be large enough to establish estimates of 
population parameters within the limits of four out of five t iies . 
Larger sa mples do not add appreciable dat a or substantially change 
the outcome of decisions obtained from the samples of five. 
(Author/HCM) 
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VERIFYING SAMPLE SIZE CONCERNS 
ABSTRACT 

The purpose of this study was to ascertain how large a sample 
Is needed to make media effectiveness decisions vhich are general- 
Izahle to the total educable mentally handicapped (EMH) population. 
The literature regarding sample size consideration was reviewed. 
The unethod employed in this study involved pretesting and posttesting 
a sample of seventy primary and intermediate EMH children on the 
content of a fi Imstrlp. r Stat 1 s tlcul analys es of the data Indlc at ed 
that samples of five students gave results that were within the 
parameters of decision established by the Computer Based Project 
(Syracuse, N, Y, ) . When the sample size was Increased to ten, the 
standard findings for increased sample size were supported, i, >• 
scores were within smaller ranges, variance between groups was re- 
duced and gains were more standardized. However, the Investigators 
concluded that samples of five subjects seem to be large enough to . 
establish estimates of population parameters within the limits of 
four out of five times. Larger samples do not add appreciable data 
or substantially change the outcome of decisions obtained from the 
samples of five* 
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SPECIAL REPORT No. 737 
COMPUTER-BASED PROJECT for the EVALUATION 
of MEDIA for the HANDICAPPED 

T i t I 6 I VERIFYING SAMPLE SIZE CONCERNS 

BACKGROUND 

The Computer Based Project for the Evaluation of Media for the Hajidicapped, 
based on contract #OEC-9-423G17-4357 (616) between the Syracuse (S^*Y, ) City School 
District and the Media Services and Captioned Films Branch, Bureau of Education 
for the Handicapped (United States Office of Education) for the five year period 
July 1* 1969 through June 30, 1974. The major goal is to improve the instruction 
of handicapped children through the development and use of an evaluation system 
to measure the instructional effectiveness of films and other materials with 
educable mentally handicapped (EMH) children, in-service training and media support 
for special teachers I and studies related to the evaluation process and the 
populations used. 

The Project has concentrated on the 600 films and 200 filwstrips from the 
Media Services and Captioned Films (BEH - USOE) depository; however, specific 
packages from Project LIFE, various elementary math curricula, and selected 
progrS^ 

model used requires that I 1) objectives of materials be specified and written? 

2) instrments be constructed to t^est and measure effectiveness? and # 3) children 
be the major sources of evaluation information, A number of instruments and 
methodologies are employed in the gathering of cognitive and affective data from 
900 EMU qhildron and 80 special teachers to mak^ the effectiveness decisions > 
Over half of the EMH population can neither read or write; therefore, a uniquo 
Student Response System (SRS) ia employed, consisting of a twenty station G,E.- 
1000 SPS which can be operated in a group or individual recording mode and is 
connected to a remote computer system. The computer capabilities consist of 
remote telephone connections to the Rome (MyV;) Air Development Command, the 
Honeywell time-shared network, and the Schenectady (NvY.) G B R'^^earch and 
Development Center; and batch mode capabilities of the Syracuse City Schools, 
Syracuse University, and various commercial sources. 

In*-service and .media support activities provide on-the-job training for 
teachers , teacher aides , equipment, and materials to the special teachers in 
the city schools. The research activities have centered around investigations 
and special problems related to the development of the evaluation model. ThQ 
four major areas considered are: 1) testing effects, 2) captioning effects, 

3) special student characteristics; and, 4) evaluation procedures validation. 

Documentation of the major activities appear in the five annual reports 
and the 600 evaluations prepared on materials us6d* Staff members v;oro encouraged 
to prepare special reports and the attached paper is one of these. The opinions 
expressed in tliis publication do not necessarily reflect thd position or policy 
of the Ccmptiter Based Project, the united States Office of Education, or the 
Syracuse City School District, and no official endorsement by any of the agencies 
should be inferred. 
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^ VERIFYING SAMPLE SIZE CONCERNS 

The problem has arisen as to how large a sample Is needed to make 
media effectiveness decisions general izable to the total 6MH population. 

In an effort to obtain the best possible effectiveness decision 
using the least number subjects and replications, this study Investi- 
gated sample size considerations. This kind of research has major Im- 
plications for cost-effectiveness; if fewer subjects are needed to 
make the decisions, more evaluations can be run on the same timv; para^ 
meters using regular Project personnel* 

BACKGROUND 

Decisions are current I y being made based on a sample of five or^^^ - 
more subjects who make a score of 50% or better correct on a cognitive 
measure subsequent to viewing a film or filmstrip; or on subjects who 
make a pos I tive gain In a pretest-posttest design. In the latter case, 
an arbitrary 20% gain Is used as a criterion, attenuating It by subject 
ive cons Ideratlons\ where the percentage was close to the criterion. < 
Some concern and question has arisen as to the efficacy, reliability, 
and validity of making decisions on such small sample sizes. 

The literature suggests several similar concerns as Indicated In 
the follow fng comments » 

For standard parametric statistical procedures/Hays (1966) sug- 
gests tha;, In experiments where there Is little "naturaT' variation in 



materials observed that small samples can give the same power as large 
sauplcs in those experiments v;hich have extremely variable conditions. 
A number of authors have been more pragmatic and practical, they simply 
suggest rather strongly that samples uuder thirty be avoided, If possible* 
when using parametric stai:istics (Gilford, 19G5; Winer, 1954; Cohen, 1969). 
Non-parametric statistics are more applicable to samples as small as 
N « 3, but suggest that groups between 3 and 30 give more reliable results 
(Se gel, 1956)^ Chi square ccncerns suggest that for degrees of freedom 
less than 30 that the Table o; Chi Square be used, but for over 30 the 
distribution approximates the normal distribution (Linqinst, 1956, page 28) 
and that cell frequencies less than five in over tvrenty percent of the 
cells are to be avoided (Seigel, 1956, pajje 178)* The relationship of 
sample size to reliability has been extensively discussed and pointed 
out. For example, Cohen relates that "reliability is always dependent . 
upon sample size'' {1969, page 6), He further states **Tlic greater the 
degree to which the means of different samples vary among themselves, 
the less any of them can be relied upon (ibid, page 7) Generally, an 
increase in sample size will reduce errors of measurement and thus in- 
crease reliability and the power of the test used (Cohen, 1969, page 11, 
13; Seigel, 1956). The gener^^l procedure has been then to obtain as large 
a sample as possible end administer most of the items to all individuals 
in the sample. This usually is an arduous task for both the experimentot 
and the subjects, if done on a continuing basis. 

The use of matrix sampling techniques suggests that it is unnecessary 
for all subjects to take all items and that just as valid population ^^^^^^^^^^^^^^^^ v^^^^^^^ 
estimates can be obtained from sub-samples of items given to sub-samples 
of the population. Immediate concern is raised ^then for the best number 



of Items In the sub-sajnpje of itcins given to the best number of subjects. 
Hay and Barcikowskl (1973) and Shoer^^aker (1971) recomended the fet;er 
Items (c.6 items) and several subtests made by single exliaustion of the 
item set give the best estimates of means f.nd standard deviations (mi^d^ 
biserial correlations are high, i,e,> .45 to .95; when biserlal correla-Vy 
tion is hot high, i.e., .05 to .35, they recommend larger subtests up 
to half the item population) . 

The question for this investigation then becomes how large a sample 
of respondents should be obtained to have estimates be stable in four 
out of five samples. That is, the probability of making a Type I error 
is equal to .20 (page 280, Hays, 1966). 

METHOD 

Several films trips in the CD? (Bond, 1972) Evaluation System have 
been sho^/ti to a nunijer of children in v;hich they were pretested before 
seeing the filmstrip and posttosted afteiv seeing it. A set of data for 
the filmstrip •'Our Hands'* was used for thio study. 

Instrument ; Tlie pretest and posttest percent correct responses 
from seventy primary and intennediate EMR children who were tested with a 
nine item multiple choice instrument before viewing a 24 frame filmstrip 
**Our Hands.'' They were again tested with the instrument after viewing. 
The percent of correct answars was computed for each student. 

Sampjc : From the population of 70 EHH children at primary and 
intermediate level who responded to items prepared for the filmstrip 
*'0ur Hands*' those scoring B0% to 100% on the pretest were dropped from 
the population. Five random samples of five subjects were selected by ; 
replaceraent after each sample from the remaining available population of 60. 



The five sample parameters are surjnarlzed in Table 1, Five random y,. 
samples of ten subjects were selecied by replacement after each sample 
from the available population and sample parameters computed as shown 
in Table II, 

Treatment ; The pretest and posttest- scores for the selected 
subJect(Sg) were recorded from the line of student response data and 
included as a measure for the sample. The descriptive statistics 
pretest mean » 5d/ posttest mean and sd and difference (gain) are com- 
puted for each sample group of 5 and 10 S^. These statistics were 
also computed for the total population of GO, 

A pretest-treatment-posttost model served to design this study. 
The null hypothesis of no differenc es bctwfit^n alL Rtoups was tested in 
a ten-group-repeated-measures one way analysis of variance (ANOVA) is 
used for the sample group of 5 and the sample group of 10. A Speanpan 
rank order correlation (rho) is computed for samples of 5 and 10 to 
indicate the correlation between the pretest/posttest scores. Test- 
sample ihteractions were tested using a tvro by five two-way AHOVA was 
used for the five samples in each to check for. 

Analysis ; The parameters of the to*-jil population o£ 60 were 
computed and are shown in both Table I and Table II below, A one-way 
analysis of variance was performed on the sample size of 5 data (and 
the sample of 10) > in a ten-group-repeated design, testing the null 
hypothesis of no differences between groups* For the S-student samples, 
th^ obtained F « 1.81 was not significant at the p « ,05 level when a 
critical value of ^9^40 ^ 2.124 was necessary to reject the no differ- 
ence. At the .80 confidence level, the differences between sample 
means needs to be 21.6 units or greater. The summary results are 
shown in Table I belowV 



TABLE I SlHRI^VilY: 



SAMPLE OF 5 STJDEHTS 



SAIIPLE SIZE I!EAN STANDARD DEVIATION % SIGNIFICAKCE* 

Pretest Posttest at .95 Ccnfolevel Diff * P«».05 P«.20 rho 



A 



D 



5 
5 



59.7 
33.2 
42.0 
44.6 
32.2 



77.5 
62.1 

62.0 
53.2 
71.6 



19.32 
10.07 

18.41 
11.46 

19.07 
31.98 

38.01 
35.53 

17.43 
14.72 



17.8 n/s n/s -.07 



28.9 n/s 8 -.10 



20.0 n/s n/s .70 



8.8 n/s n/s .90 



37.8 s 3 .53 



Sample population (N»25) - * 

42.54 65.21 

Total Population (N=60) 

47.03 62.04 



22.67 
15.01 



>At a confidence level P = .20, differences— 21.6 

P = .05, differences - 32.6 



.17 



For the 10 students samples, the obtain F= 1.27 was not significant 

at the P = .05 level when a critical value of F^ 2.124 was necessary 

9 ,40 

to reject the null hypothesis. The summary results are shown in Table II. 



TABLE II 
SUMMARY: SAIIPLE OF 10 SUBJECTS 



SAMPLE SIZE MEAN STANDAIID DEVIATION % SIGNIFICANCE rho 

Pretest Posttest at .95 level Gain P-.05 P=.20 

1 10 36.3 15.50 17.0 NO YES 

53.3 23.26 

2 10 42.2 - 19.94 14.5 NO YES 

56.7 24.66 

'3 10 40.7 22.02 11.2 NO NO 

51.9 22.16 

4 10 44.2 15.70 10.0 NO NO 

54.2 17.57 

5 10 35.4 23.7 16.8 KO YES 

, „; ^ _ „ .:,„_....._52, 2.. .-;.„23:.3..::^, -':-.,-,^--...:...,--:: 

SAMPLE GRAND MEAN (N«50) 

39.84 53.60 13.76 „ 

POPULATION GRAND MEAN (N«60) 

43.16 61.88 18.72 



At Confidence Level P = .05, differences must be greater than 19.56 

P «. 20, differences must be greater than 12.76 



The two data sets were regrouped Into a 2 x 5 two-way ANOVA to check 
for inter^ctioii e^ to assume independence of pretest P'??t~ 
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TABLE III 
SUMfLXRY: TOO-WAY ANOVA 
5 STUDENT SAIIPLES 



SOURCE SS 


DF 


1 1- SQUARE 


F 


df 1 


SIGNIFICANCE* 


TEST 6498.0 


1 


6498.0 


9 . 886 


1,40 


P = .005 


GROUPS 2895.88 


4 


723.97 


1.101 


4,40 


NS 


TEST X GROUPS 1297.00 


4 


324.25 


.493 


4,40 




SSE 26290.81 


AO 


657.27 








SST 36981.69 


49 


754.73 








^Tabled Fj^ « 8.8278 
^.40 - 2.606 


at P 
at P = 


=• .005 (0\ven, 
.05 


1962) 







TABLE IV 
SUimRY : TWO-WAY ANOVA 
10 STUDENT S/JJPLES 



SOURCE SS DF M- SQUARE F df SIGNIFICANCE* 



TESTS 


4830.25 


1 


4830.25 


9.9219 


1, 


,90 P » .005 


GROUPS 


519.84 


4 


129.95 


.2669 


4, 


,90 NS 


TEST X GROUP 


. 204.40 


4 


51.10 


.1049 


4, 




SSE 


43814.11 


90 


486.82 








SST 


49368.60 


99 


498.67 








'^Wot Tabled 


^1.80 =^'"7 


at P «^ 


'.005 ' 










P4.8O - 2.72 


at P » 


.05 
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DISCUSSION 



RESULTS IN 5 STUDENT SAllPLE 

The obtained F « 1.81 was not significant at the p « ,05 level 
where a critical value of F^^ ^r^^ =» $•124 was necessary to reject the no 
difference hypothesis. This suggests that the differences between the 
sample test scores Is not significant at the p « .05 level; however, as 
noted in the five subject case, the value of the posttest correct in 
every sample is above the 50% value used as one of the decision criteria 
in the evaluation process* All gains are positive and four of the five 
are near or above the 20% criteria. (The 17.8% gain of Group A is 
acceptable because 20% is not ex4ct from a sample of nine items; i#e., 
the cut off is between 11% and 22%0 

The confidence level for each sample mean differences between post- 
test and pretest means x^as computer for p « .20 and indicated under the 
''significance column/* Note the differences were significant at the 
stated level in four out of the five samples. 

Tlie rho values leave a great deal to be desired except to suggest 
that four of the five were positive and three of the five are greater 
than .50. The gain score model does not lend itself to high reliability 
scores because the amount of gain effects the ranking on the posttest 
yet the only concern is th^t gain la fact takes place (Vargas, 1969; 
Cox, 1966) . 

RESULTS IN 10 STUDENT SAi^fPLE 

The obtained F « 1*27 was not significant at p » ,05 suggesting the 
obtained score from the pretest and postteot do not differ enough to 
indicate a significant change in behavior as a result of seeing the 



filmstrip. The two-way AHOVA, hovever^ assuneo some independence of 
the pretest and posttest as measures and results in a significant 
F 9.92, p a .05, with non-significant F values for between groups or 
interaction. 

IMPLICATIONS 

Tlie above results indicate that the sainples of five students 
give results which are within the parameters of decision established 
for the Project, \ihen samples were increased to N « 10, the standard 
findings were supported for increasing sample size, i.e., scores were 
within smaller ranges, variance between groups were reduced and gains 
were more standardized. \^hy the parameters of all groups tend to be 
below the pppulation PM&i^tprs 

ation present in the salnpleis selected. The resulting low posttest 
scores of the samples cause the gain scores to be depressed more than 
may be reflected in the population causing all the gain scored to fall 
below the 20% criteria. All posttests are above the 50% criterion 
and all gains are positive and at about magnitude which should lead 
one to realize some stability has been reached. 

CONCLUSIONS 

Samples of 5 students seem to be large enough to establish 
estimates of population parameters within the limits of four out of 
five times. Larger samples do not add appreciably data or the out- ; 
come of decisions obtained from the samples of five* 
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